In a world and time where depression, suicides, epidemics, wars have become common words, people seem to get more and more tensed about the future. Worry and anxiety seem to rule one’s heart and more and more people are getting mentally and physically ill. As we are not medical students and experts in this field, we thought of ways to increase people’s Happiness. That sounds logical but we tend to be unaware of the factors that would make people more happy.
We wanted to understand what factors would have a positive impact on Happiness. Therefore we found interest in a particular study that explained Happiness through several factors. This work provided a happiness score per country and thus indicating a rank per country as well. First of all we wanted to use that study, but as it was already presenting the desired results, we decided to start the study from zero and seek for the raw datasets of these factors and keep only the Happiness score of the different world countries. We will use some of the variables from the study as well as proposing new ones that we have chosen. This topic was interesting for us to work on as well as to develop our skills in R.
Overall, we will focus on a global scale because we will compare and contrast the different results that a specific region can give us.
While the reports on the subject have been constant during the last decade, we will set up this assignment in the year 2017 because of two reasons : first, the data and information we got from different sources are more complete for 2017 than any other years. Second, after these years, the COVID-19 crisis emerged worldwide, and the distortion in the data from 2020 onwards could be important due to the constant regulations that societies had. Although there are always international problems that harm every part of the world, this was something of a big scale that will remain in our history. Because we want to avoid any significant bias, we prefer to focus our project on the year 2017, which is a year with no major happenings.
This analysis’ main objective is to evaluate each variable’s effectiveness in the final measurement of Happiness. Moreover, we want to highlight the most important factors impacting people and explain them. Finally, we want to discover which factors are more prone to increase the final score and we want to search for a tendency in each continent by separating the countries into groups.
We have thought of different questions that we wanted our work to answer :
To what factors is Happiness correlated to ?
Which socio-economic factors do country authorities need to focus on in order to increase people’s Happiness ?
How come that the main factors do not have the same impact in different continents ?
For this part of the project, we decided to present the different
datasets in a table form and to do that, we used the function
kablefrom the package kableExtra. This gives
us a table with each variable and its definition.
This dataset is the main one that gave us an idea of the type of project we wanted to have. However, as this dataset presented almost every results, we decided to get inspiration from it and search the raw datasets and then produce the Happiness Score ourselves. There are also some variables we decided to omit, such as Whisker.high, Whisker.low, Freedom, Generosity and Government Corruption.
Below are listed the variables that we kept. They are
Country, Region, Happiness Rank and
Happiness Score. We decided to keep the score coming from this
dataset because its results are coming from a survey that was done in
2017. As it is difficult to reproduce such survey results, we kept the
original country scores. To choose the variables we wanted to keep, we
simply subsetted the columns that we were interested in by using the
brackets [ ] and putting the columns’ number we want to
keep.
| Variable | Definition |
|---|---|
| Country | Name of the country |
| Happiness Rank | Rank of the country based on the Happiness Score |
| Happiness Score | A metric measured in 2017 by asking the sampled people the question: How would you rate your happiness on a scale of 0 to 10 where 10 is the happiest. |
Source : https://www.kaggle.com/datasets/unsdsn/world-happiness
For the following dataset from the World Bank Open Data, the same
data wrangling skills have been applied. First of all, following a
constant error message and the uselessness of some variables for our
project, we had to remove some columns from the original datasets. Then
we decided to only keep the name of the countries and the values
corresponding to each factor for 2017 once again with the use of
brackets and select the columns we want. Finally, we had to rename the
columns for better understanding purposes using the function
colnames.
This dataset presents of life expectancy at birth in years from 1960 to 2021 in the world. We wanted to focus on the values themselves so we did not need the Country Code, Indicator Name and Indicator Code. So what remains is the name of the country and the corresponding value for 2017.
| Variable | Definition |
|---|---|
| Country | Name of the country |
| 2017 | Life Expectancy at birth (years) in 2017 |
Source : https://data.worldbank.org/indicator/SP.DYN.LE00.IN?most_recent_year_desc=false
This dataset focuses on the Government spendings for Education throughout the world from 1960 to 2021.
| Variable | Definition |
|---|---|
| Country | Name of the country |
| 2017 | Government spendings on Education in 2017 |
This dataset provides information on the GDP per capita and per country for period of time going from 1960 to 2021.
| Variable | Definition |
|---|---|
| Country | Name of the country |
| 2017 | GDP per capita in 2017 |
Source : https://data.worldbank.org/indicator/NY.GDP.PCAP.CD?end=2021&most_recent_year_desc=false&start=2015
For this factor, we had first two datasets separately : the percentage of wages for male and the percentage of wages for female. However, what interested us was to understand the link between Happiness and the gap in male and female wages. Therefore, we collected both datasets and then computed the difference in the wages to have the gap in a different sheet that we did on Excel. Our last table then presents the percentage of the gap between male and female wages.
| Variable | Definition |
|---|---|
| Country | Name of the country |
| 2017 | Percentage in wage gap between male and female in 2017 |
Sources : https://data.worldbank.org/indicator/SL.EMP.WORK.MA.ZS?end=2018&most_recent_year_desc=false&start=2017, https://data.worldbank.org/indicator/SL.EMP.WORK.FE.ZS?end=2018&most_recent_year_desc=false&start=2017
This dataset shows us the tonnes/capita of carbon dioxyde emissions
in 2017. As for the other datasets, some unnecessary variables were
removed in order to keep those that were interesting to us. The
variables that we removed are Indicator, Subject,
Measure, Frequency and Flag Codes. Therefore
those that remain are Location (that we renamed
Country), Time (renamed Year) and
Value. The renaming is done once again through the function
colnames. Moreover, the country variable was presented in a
country code form. So we also had to transform the codes into the
respective country name. For that, we had to install a package called
countrycode and use its function with the same name.
| Variable | Definition |
|---|---|
| Country | Name of the country |
| Year | Year 2017 |
| Value | Value corresponding to the tonnes/capita of CO2 emissions |
Source : https://data.oecd.org/air/air-and-ghg-emissions.htm
This dataset provided insightful information on many socio-economical as well as ecological factors that we could use in our project in order to possibly explain Happiness. However, as the dataset presents a lot of variables (62) and we would not use all of them here as well, we chose to reduce its number and focus on two variables that we judged interesting. The chosen ones are listed in the table below. We also subsetted the columns with the brackets.
| Variable | Definition |
|---|---|
| Country | Name of the country |
| Year | Year the value was observed |
| Internet | Proportion of population covered with at least a 3G mobile network (%) |
| Renewable Energy | Renewable energy consumption (% of total final energy consumption) |
Source : https://www.kaggle.com/datasets/truecue/worldsustainabilitydataset
This dataset provides information on the percentage of forests in a particular country. This dataset was interesting for us, because we were seeking data on nature and ecological factors and this one seemed unusual yet intriguing for us to link to Happiness.
| Variable | Definition |
|---|---|
| Country | Name of the country |
| 2017 | Percentage of forests in 2017 |
Source : https://data.worldbank.org/indicator/AG.LND.FRST.ZS
Let us now explore our data ! Indeed we want to understand the impact
of our different factors on Happiness. For that, we decided to merge all
the variables we have, to the original Happiness score, to have an idea
of the relationships between variables. We then create one dataset with
all the desired variables, which is called HS_172. For that, we
had to first transform the values of our datasets in “numeric” type
because they initinally were in “character” type and we could not work
with this type. We used the function as.numeric to do that
for every factor. Then we used the function merge in order
to create the dataset HS_172 mentioned earlier and merged our
different factors according to their variable Country. After
that, we had to rename the columns according to the name of the factor
merged to the dataset with the function colnames. Then we
were faced with some NA values and to remove them we used the
function na.omit. However, it did not work because there
still were some empty observations. Thus, we created a loop linked to a
conditional using the function ìfelse to delete those blank
boxes. Finally, we remarked that in our factors’ datasets, the country
name for “Russia” was under the name “Russian Federation”. In order to
avoid problems during the EDA, especially in the case of maps, we
decided to change the name to “Russia”. For that, we again used the
brackets.
Finally, in order to represent the merged dataset, we created a table
with the function kable and specifying the style with
kable_styling and the size of the table with
scroll_box coming from the package kableExtra.
This table presents the values rounded to 2 digits thanks to the
round function.
| Country | Happiness.Rank | Happiness.Score | Education | Forest_per | GDP | Life_exp | Wages | Internet | EnergyE | Pollution |
|---|---|---|---|---|---|---|---|---|---|---|
| Albania | 109 | 4.64 | 3.61 | 28.79 | 4531.02 | 78.33 | -3.74 | 62.40 | 37.22 | 1.53 |
| Algeria | 53 | 5.87 | 6.51 | 0.82 | 4109.70 | 76.50 | -7.53 | 47.69 | 0.14 | 3.15 |
| Angola | 140 | 3.80 | 2.47 | 54.76 | 2313.22 | 60.38 | 19.04 | 32.00 | 56.18 | 0.56 |
| Argentina | 24 | 6.60 | 5.45 | 10.56 | 14613.04 | 76.37 | -6.68 | 74.29 | 10.37 | 3.95 |
| Armenia | 121 | 4.38 | 2.71 | 11.56 | 3914.53 | 74.80 | -2.29 | 64.74 | 12.56 | 1.75 |
| Australia | 10 | 7.28 | 5.14 | 17.42 | 53934.25 | 82.50 | -8.30 | 86.55 | 9.69 | 15.97 |
| Austria | 13 | 7.01 | 5.37 | 47.12 | 47429.16 | 81.64 | -4.77 | 87.94 | 33.96 | 7.30 |
| Azerbaijan | 85 | 5.23 | 2.47 | 13.27 | 4147.09 | 72.69 | 8.84 | 79.00 | 1.91 | 3.12 |
| Bahrain | 41 | 6.09 | 2.32 | 0.82 | 23742.94 | 77.03 | 0.46 | 95.88 | 0.00 | 19.92 |
| Belarus | 67 | 5.57 | 4.79 | 42.98 | 5785.67 | 74.13 | -3.10 | 74.44 | 7.29 | 5.71 |
| Belgium | 17 | 6.89 | 6.43 | 22.76 | 44198.48 | 81.49 | -6.73 | 87.68 | 9.64 | 7.94 |
| Benin | 143 | 3.66 | 3.54 | 29.13 | 1136.59 | 61.17 | 11.01 | 13.30 | 45.38 | 0.60 |
| Bolivia | 58 | 5.82 | 8.66 | 47.53 | 3351.12 | 70.94 | 5.13 | 43.83 | 7.46 | 1.80 |
| Botswana | 142 | 3.77 | 7.30 | 27.54 | 7296.09 | 68.81 | 3.36 | 41.41 | 28.37 | 3.33 |
| Brazil | 22 | 6.64 | 6.32 | 59.83 | 9928.68 | 75.46 | -10.25 | 67.47 | 45.44 | 2.09 |
| Bulgaria | 105 | 4.71 | 4.08 | 35.50 | 8366.29 | 74.81 | -5.36 | 63.41 | 17.08 | 6.03 |
| Cambodia | 129 | 4.17 | 3.20 | 48.35 | 1385.26 | 69.29 | 11.25 | 32.90 | 61.47 | 0.65 |
| Cameroon | 107 | 4.70 | 3.06 | 43.38 | 1469.45 | 58.51 | 16.54 | 23.20 | 80.34 | 0.26 |
| Chile | 20 | 6.65 | 5.42 | 24.00 | 14962.56 | 79.91 | -1.67 | 82.33 | 24.11 | 4.68 |
| China | 79 | 5.27 | 3.67 | 22.74 | 8816.99 | 76.47 | 1.74 | 54.30 | 12.86 | 6.68 |
| Colombia | 36 | 6.36 | 4.54 | 53.84 | 6376.71 | 76.92 | -3.67 | 62.26 | 32.53 | 1.45 |
| Costa Rica | 12 | 7.08 | 7.07 | 58.48 | 12225.57 | 79.91 | -5.48 | 71.58 | 36.20 | 1.55 |
| Croatia | 77 | 5.29 | 3.85 | 34.52 | 13629.29 | 77.83 | -4.43 | 67.10 | 29.80 | 3.91 |
| Cyprus | 65 | 5.62 | 5.72 | 18.68 | 26608.88 | 80.67 | -5.82 | 80.74 | 11.08 | 7.53 |
| Denmark | 2 | 7.52 | 7.75 | 15.64 | 57610.10 | 81.10 | -5.82 | 97.10 | 35.51 | 5.51 |
| Dominican Republic | 86 | 5.23 | 3.92 | 43.88 | 7609.35 | 73.69 | -21.99 | 67.57 | 16.98 | 2.05 |
| El Salvador | 45 | 6.00 | 3.73 | 28.83 | 3910.25 | 72.87 | 12.87 | 33.82 | 24.99 | 0.94 |
| Estonia | 66 | 5.61 | 4.96 | 57.04 | 20437.77 | 78.09 | -7.19 | 88.10 | 27.03 | 12.51 |
| Ethiopia | 119 | 4.46 | 5.65 | 15.32 | 768.52 | 65.87 | 5.13 | 18.62 | 90.27 | 0.12 |
| Finland | 5 | 7.47 | 6.36 | 73.73 | 46412.14 | 81.63 | -7.97 | 87.47 | 44.48 | 7.70 |
| France | 31 | 6.44 | 5.45 | 31.05 | 38781.05 | 82.58 | -5.92 | 80.50 | 14.14 | 4.64 |
| Gabon | 118 | 4.47 | 3.33 | 91.46 | 7230.43 | 65.84 | 6.35 | 50.32 | 90.12 | 1.22 |
| Georgia | 125 | 4.29 | 3.57 | 40.62 | 4357.00 | 73.41 | -0.44 | 59.71 | 28.03 | 2.45 |
| Germany | 16 | 6.95 | 4.87 | 32.68 | 44652.59 | 80.99 | -4.96 | 84.39 | 15.22 | 8.70 |
| Ghana | 131 | 4.12 | 3.53 | 35.00 | 2074.29 | 63.46 | 15.54 | 37.88 | 44.86 | 0.50 |
| Greece | 87 | 5.23 | 3.48 | 30.27 | 18582.09 | 81.29 | -8.31 | 69.89 | 16.38 | 5.87 |
| Guatemala | 29 | 6.45 | 2.95 | 33.25 | 4454.05 | 73.81 | 13.88 | 40.70 | 65.07 | 0.95 |
| Haiti | 145 | 3.60 | 1.50 | 12.94 | 1369.06 | 63.29 | 15.54 | 31.00 | 76.17 | 0.30 |
| Honduras | 91 | 5.18 | 4.94 | 57.40 | 2453.73 | 74.90 | 9.58 | 32.14 | 46.24 | 0.90 |
| Hungary | 75 | 5.32 | 4.61 | 22.54 | 14623.70 | 75.82 | -3.80 | 76.75 | 14.51 | 4.97 |
| Iceland | 3 | 7.50 | 7.58 | 0.49 | 72010.15 | 82.66 | -7.72 | 98.26 | 76.83 | 4.87 |
| India | 122 | 4.32 | 4.31 | 24.00 | 1980.67 | 69.17 | 1.03 | 18.20 | 32.21 | 1.63 |
| Indonesia | 81 | 5.26 | 2.67 | 50.04 | 3837.58 | 71.28 | 11.72 | 32.34 | 25.43 | 1.82 |
| Ireland | 15 | 6.98 | 3.51 | 11.18 | 69774.03 | 82.16 | -13.95 | 84.11 | 9.94 | 7.47 |
| Israel | 11 | 7.21 | 6.06 | 6.47 | 40774.13 | 82.55 | -6.35 | 81.58 | 3.85 | 7.34 |
| Italy | 48 | 5.96 | 4.04 | 31.80 | 32406.72 | 82.95 | -10.67 | 63.08 | 16.43 | 5.36 |
| Jamaica | 76 | 5.31 | 5.26 | 54.04 | 5070.10 | 74.27 | -11.44 | 55.07 | 10.72 | 2.35 |
| Japan | 51 | 5.92 | 3.13 | 68.41 | 38834.05 | 84.10 | -2.71 | 91.73 | 6.97 | 8.87 |
| Jordan | 74 | 5.34 | 3.23 | 1.10 | 4231.52 | 74.29 | -13.68 | 66.79 | 5.26 | 2.49 |
| Kazakhstan | 60 | 5.82 | 2.75 | 1.25 | 9247.58 | 72.95 | -1.72 | 76.43 | 1.99 | 11.42 |
| Kenya | 112 | 4.55 | 4.96 | 6.29 | 1633.49 | 65.91 | 15.87 | 17.83 | 71.32 | 0.34 |
| Kuwait | 39 | 6.11 | 6.37 | 0.35 | 29759.47 | 75.31 | -1.56 | 98.00 | 0.03 | 21.61 |
| Latvia | 54 | 5.85 | 4.37 | 54.63 | 15695.12 | 74.63 | -4.62 | 80.11 | 42.60 | 3.44 |
| Lebanon | 88 | 5.22 | 2.13 | 13.83 | 7776.03 | 78.83 | -28.37 | 78.18 | 3.95 | 3.95 |
| Lithuania | 52 | 5.90 | 3.81 | 35.06 | 16885.41 | 75.48 | -5.38 | 77.62 | 33.78 | 3.81 |
| Luxembourg | 18 | 6.86 | 3.49 | 34.45 | 110193.21 | 82.10 | -1.98 | 97.36 | 15.33 | 14.47 |
| Malaysia | 42 | 6.08 | 4.68 | 58.63 | 10259.30 | 75.83 | 0.76 | 80.14 | 5.22 | 6.78 |
| Malta | 27 | 6.53 | 4.65 | 1.31 | 28857.02 | 82.35 | -10.83 | 81.01 | 7.25 | 3.25 |
| Mexico | 25 | 6.58 | 4.52 | 33.99 | 9287.85 | 74.95 | 0.27 | 63.85 | 9.54 | 3.62 |
| Moldova | 56 | 5.84 | 5.62 | 11.75 | 3509.69 | 71.72 | -8.98 | 76.12 | 26.06 | 2.73 |
| Mongolia | 100 | 4.95 | 4.07 | 9.10 | 3687.10 | 69.51 | -6.81 | 23.71 | 3.60 | 6.19 |
| Morocco | 84 | 5.24 | 5.12 | 12.80 | 3035.45 | 76.22 | 13.74 | 61.76 | 10.42 | 1.63 |
| Mozambique | 113 | 4.55 | 5.51 | 47.57 | 461.41 | 59.31 | 18.48 | 7.80 | 68.51 | 0.22 |
| Namibia | 111 | 4.57 | 9.71 | 8.32 | 5367.11 | 63.02 | 10.73 | 36.84 | 29.41 | 1.59 |
| Nepal | 99 | 4.96 | 4.77 | 41.59 | 1048.45 | 70.17 | 23.00 | 21.40 | 76.41 | 0.37 |
| Netherlands | 6 | 7.38 | 5.18 | 10.89 | 48675.22 | 81.76 | -6.53 | 93.20 | 6.36 | 9.07 |
| New Zealand | 8 | 7.31 | 6.26 | 37.41 | 42925.00 | 81.66 | -6.77 | 90.81 | 30.45 | 6.67 |
| Nicaragua | 43 | 6.07 | 4.36 | 30.81 | 2159.16 | 74.07 | 7.81 | 27.86 | 48.87 | 0.80 |
| Niger | 135 | 4.03 | 2.58 | 0.88 | 517.77 | 61.60 | 5.50 | 10.22 | 79.44 | 0.08 |
| Norway | 1 | 7.54 | 7.91 | 33.29 | 75496.75 | 82.61 | -4.17 | 96.36 | 61.11 | 7.20 |
| Pakistan | 80 | 5.27 | 2.90 | 4.99 | 1631.53 | 66.95 | 18.41 | 13.78 | 42.09 | 0.88 |
| Panama | 30 | 6.45 | 2.88 | 57.31 | 15146.41 | 78.15 | -5.06 | 59.95 | 23.58 | 2.29 |
| Paraguay | 70 | 5.49 | 3.09 | 42.64 | 5678.87 | 73.99 | 3.10 | 61.08 | 60.11 | 1.12 |
| Peru | 63 | 5.72 | 3.93 | 56.90 | 6710.51 | 76.29 | 10.14 | 50.45 | 27.60 | 1.58 |
| Poland | 46 | 5.97 | 4.56 | 30.85 | 13864.68 | 77.75 | -7.83 | 75.99 | 11.19 | 7.96 |
| Portugal | 89 | 5.20 | 5.02 | 36.15 | 21490.43 | 81.42 | -8.38 | 73.79 | 24.42 | 4.93 |
| Qatar | 35 | 6.38 | 2.97 | 0.00 | 59124.87 | 79.98 | -0.02 | 97.39 | 0.00 | 30.55 |
| Romania | 57 | 5.82 | 3.10 | 30.12 | 10807.01 | 75.31 | -3.16 | 63.75 | 23.38 | 3.62 |
| Rwanda | 151 | 3.47 | 3.13 | 11.07 | 772.29 | 68.34 | 20.50 | 21.77 | 86.91 | 0.09 |
| Saudi Arabia | 37 | 6.34 | 8.02 | 0.45 | 20802.46 | 74.87 | -3.90 | 94.18 | 0.02 | 15.75 |
| Senegal | 115 | 4.53 | 4.62 | 42.53 | 1361.70 | 67.38 | 11.46 | 29.64 | 39.23 | 0.51 |
| Serbia | 73 | 5.39 | 3.71 | 31.12 | 6292.54 | 75.54 | -7.45 | 70.33 | 20.09 | 6.63 |
| Singapore | 26 | 6.57 | 2.77 | 22.63 | 61150.73 | 83.10 | -9.13 | 84.45 | 0.69 | 8.43 |
| Slovenia | 62 | 5.76 | 4.78 | 61.77 | 23514.03 | 81.03 | -5.39 | 78.89 | 19.67 | 6.64 |
| Spain | 34 | 6.40 | 4.21 | 37.15 | 28170.17 | 83.28 | -7.90 | 84.60 | 15.68 | 5.49 |
| Sweden | 9 | 7.28 | 7.57 | 68.69 | 53791.51 | 82.41 | -7.81 | 93.01 | 52.86 | 3.65 |
| Switzerland | 4 | 7.49 | 4.95 | 31.86 | 83352.09 | 83.55 | -3.82 | 89.69 | 25.00 | 4.37 |
| Tajikistan | 96 | 5.04 | 5.84 | 3.04 | 848.67 | 70.65 | 5.09 | 21.96 | 41.69 | 0.59 |
| Tanzania | 153 | 3.35 | 4.43 | 53.23 | 1004.91 | 64.48 | 7.61 | 16.00 | 83.83 | 0.18 |
| Thailand | 32 | 6.42 | 3.36 | 39.11 | 6593.82 | 76.68 | 0.08 | 52.89 | 22.27 | 3.53 |
| Togo | 150 | 3.49 | 3.76 | 22.40 | 830.75 | 60.49 | 23.74 | 12.36 | 77.74 | 0.16 |
| Ukraine | 132 | 4.10 | 5.42 | 16.69 | 2638.33 | 71.78 | -4.13 | 58.89 | 6.47 | 3.82 |
| United Kingdom | 19 | 6.71 | 5.38 | 13.08 | 40857.76 | 81.26 | -8.23 | 90.42 | 9.72 | 5.45 |
| United States | 14 | 6.99 | 5.11 | 33.87 | 59914.78 | 78.54 | -1.80 | 87.27 | 9.92 | 14.64 |
| Uruguay | 28 | 6.45 | 4.47 | 11.24 | 18690.89 | 77.63 | -6.80 | 70.32 | 60.68 | 1.68 |
| Uzbekistan | 47 | 5.97 | 5.03 | 8.20 | 1916.76 | 71.39 | 1.41 | 48.70 | 1.77 | 3.23 |
| Vietnam | 94 | 5.07 | 4.09 | 46.00 | 2974.12 | 75.24 | 9.91 | 58.14 | 31.98 | 1.96 |
The factors we decided to concentrate on in this study are the following : Life expectancy at birth, GDP per capita, Government spendings in Education, Forests percentage, CO2 Emissions, Access to Internet, Renewable energy consumption and Gender Wage Gap.
First of all we wanted to present those factors in a visually
interesting way in order to understand the situation worldwide as well
as the interaction between those factors. For that, we created world
maps representing each factors. To create those maps, we had to download
a “.shp” file with the world countries data online. Then we created a
data frame corresponding to the data of the previously mentioned file.
After that came a sequence of computations using the function
left_join in order to add our factors’ values to our new
data frame. Finally, to create the interactive maps, we used the package
plotly.
This map illustrates life expectancy of different countries in 2017. The average life expectancy in the world is 72.39 and varies from 52.24 to 84.22 years. A stark difference is seen in African countries where we find the lowest rates and an average of 62.18 years. The countries with the highest life expectancy are Hong kong with 84.22 years and Japan with 84.09 years.
This map shows public spending on education as a percentage of GDP per country. Which measures the priority given by governments to educational services and institutions. The highest investment on education was seen in Greenland with 11.3 % and the lowest was 1.19 % in Venezuela.
The map shows the gender wage gap across countries. The country that has lowest rate in dark blue colour is Syria with -37.73%, while the country that has the highest rate is Mauritania with 28.52%, followed by Nepal with 23%, Papua New Guinea 19.67%, Congo 19.32% and Pakistan 18.41%. Most of the countries with higher Gender Wage Gap are in Africa and Asia.
The map shows the percentage of CO2 emissions in the world across countries. The countries that have the highest rates in light grey colour are Australia with 15.97%, Saudi Arabia 15.75%, Canada 15.29% and United States 14.64%. The countries with lowest rates in black colour are some African countries such as Niger with 0.08%, Madagascar 0.13%, Mozambique 0.22% and Cameroon 0.26%. Also countries in the American Central have low rates as Nicaragua with 0.8%, Honduras 0.9%, El Salvador 0.94% and Guatemala 0.95%. Haiti in the Caribbean has 0.3%. Countries in Asia such as Nepal have 0.37%, Bangladesh 0.54%, Tajikistan 0.59%, Pakistan 0.88% and Cambodia 0.65%.
The map shows the proportion of the population that have access to the internet with at least a 3G mobile network across the countries. The highest rates in dark red are Iceland with 98.26%, followed by Denmark 97.01%, Norway 96.36%, Saudi Arabia 94.18%, Netherlands 93.02%, Sweden 93.01%, Canada 92.07% and United kingdom 90.42%. Indicating that many developed countries have most of the population covered with the internet. While the lowest rates in dark blue are African countries such as Eritrea with 1.31%, Chad 6.5%, Mozambique 7.8%, Niger 10.22%. Also Pakistan has a lower rate 13.78% and India 18.2%.
The map shows the renewable energy consumption in the world, as a percentage of the energy consumption total. Africa concentrates the countries that have the highest rates of energy consumption in the world. Uganda has the lightest colour with 90.66%, Ethiopia 90.27% and Gabon 90.12%. While the countries that have the lowest rates are still in Africa in dark blue colour. Oman has 0%, Saudi Arabia 0.02% and Algeria 0.14%.
The map shows the percentage of forests in the countries. Greenland has the darkest green colour with almost 0% (0.000535997%) of forest, followed by some countries in Africa such as Egypt with 0.05% and Oman with 0.01%. Papua New guinea has the lightest colour with 79.4%, followed by Finland with 73.73% and Sweden with 68.68%.
It’s important to see how the variables from the first data set influence the final score of each country, because through this we can infer how relevant they are for the people.
The two bar plots that are in horizontal show us how the contribution of each variable is distributed. From the first graph, which is a focus on the 5 countries with the best Happiness score, we can remark consistency in most of the contributions of the variables. Furthermore, it is possible to say that the trust on the Government and the Corruption are the ones that suffer more changes. In contrast, in the second graph, which focuses on the 5 worst countries again in terms of Happiness score, we can easily observe disparity and a big difference in the average contribution of each variable. Notably, none of the variables has a consistency in the countries and the one that shpws the most change is the Dystopia.
The steps to plot these 2 graphs were the following : First, we did a
little bit of data wrangling after uploading the main dataset (Happiness
Score dataset). We had to subset the first 3 columns that interested us
using the brackets and change their names. Then, correct the datasets
because all the columns were characters; additionally, we add to the
countries their respective continent with the help of the package
countrycode. Moreover, we chose the countries with the best
score and then transposed all the data frame to make easier the process
of making the graph. Finally, we use the function barplot
to make the accumulated barplot. We repeated the same process for the 5
worst countries.
We also wanted to group the countries into continents and present
results based on those groups. In order to do that, we again used the
package countrycode, which gives us a new column with the
corresponding continent for each country. Our goal was to show the
average Happiness score for each continent. Therefore we used the
function filter to have the continents separated and we
then computed the average Happiness score per continent. We finally
created a new dataset with all those values.
From the below bar graph created from the package
highcharter, we can visualize the proportion in Happiness
score by continent. Oceania represent the happiest continent with the
highest value of 7.3. However, because Oceania only consists of two
countries Australia and New Zealand, whose respective
Happiness scores are above the average, the results for this continent
have to be taken cautiously. Without considering this, Asia is the
continent with the highest scores in terms of Happiness, followed by
Europe and Americas. We notice that Africa has the lowest average
Happiness score.
Before that, we have seen in a simple way how the Happiness Score was
for each continent. Now to observe better the distribution of each
observation in their respective continents and, in the same way, some
important statistics of the regions, we used the boxplot. We applied the
ggplot package, to add to the boxplot, points to visualize
every observation. Likewise, ggtext was used to change the
size of the title.
Specifically in the output of the graph, we demonstrate that there are few atypical observations inside the measurement of the Happiness score. Here, we decided to remove Oceania, because a boxplot of only 2 observations would not be useful. On the other hand, notice that from the other boxplots, Americas is the one with the most concentrated data, because it has the lowest range (Maximum Value – Minimum value), indicating that this continent maintains a trend without many fluctuations. Europe is positioned as the region with more dispersion in their data (because of their IQR), which tell us that countries there present notable differences in their results. Finally, as we have perceived in other graphs, Africa is located below the other ones due to their low Happiness Score.
Let us now go on in our exploratory analysis by calling a basic
scatterplot of all our variables.
When looking at this scatterplot, we want to look for any
relationship with our experiment variable Happiness.Score. We
notice that Life_exp, Wages, Internet and
Pollution seem to be related in a way to the score of
Happiness.
Let us then have a look at each variable that seem to be linked with
the Happiness score. For that, we wanted to present the link in an
interactive way and therefore, we used once again the
plotly package.
We start with Life Expectancy at birth.
We can see from the previous chart that the Happiness score seems to increase when the life expectancy at birth tends to increase as well.
Then we have the link with the wage gap between male and female.
In this case, we find an interesting result too. Indeed, we tend to notice that with a high wage gap between gender (M-F), we see a low Happiness score. The converse is also true, with a low gap in wages, the Happiness score seems to increase. The wage gap is computed by by the difference between male wages and female wages. Interestingly, we see that countries where women salaries are slightly higher than men seem to be happier than the others.
The next plot shows the relationship with the access to 3G Internet.
We observe a positive link. As more people have access to Internet, the Happiness score increases.
Finally, we will study the link with the emissions of CO2, which corresponds to air pollution.
In this situation, the results are surprising, because we see from the plot, that with more emissions of CO2, so with more air pollution, people seem to be more happy, which is not logical at all. We will have to be careful with this variable when computing the regression.
We can maybe explain this by the fact that in most developed countries, the resources and technology used release more CO2 than in countries in Africa for example. Therefore, the development of a country shows its CO2 emissions and explains why some countries like Qatar, with a high pollution rate has a Happiness score that is higher than some European countries that are considered developed.
Then something that we will have to also be cautious about is the correlation between explanatory variables, that could lead to multicollinearity problems and thus provide us biased results. We notice correlation between the following variables : Life_exp and Internet, Life_exp and Pollution as well as Internet and Pollution.